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Abstract. In this paper, we analyze the convergence as well as the rate of convergence of asyn¬ 
chronous distributed quadratic programming (QP) with dual decomposition technique. In general, 
distributed optimization requires synchronization of data at each iteration step due to the interde¬ 
pendency of data. This synchronization latency may incur a large amount of waiting time caused by 
an idle process during computation. We aim to attack this synchronization penalty in distributed 
QP problems by implementing asynchronous update of dual variable. The price to pay for adopting 
asynchronous computing algorithms is unpredictability of the solution, resulting in a tradeoff between 
speedup and accuracy. Thus, the convergence to an optimal solution is not guaranteed owing to the 
stochastic behavior of asynchrony. In this paper, we employ the switched system framework as an 
analysis tool to investigate the convergence of asynchronous distributed QP. This switched system 
will facilitate analysis on asynchronous distributed QP with dual decomposition, providing necessary 
and sufficient conditions for the mean square convergence. Also, we provide an analytic expression 
for the rate of convergence through the switched system, which enables performance analysis of 
asynchronous algorithms as compared with synchronous case. To verify the validity of the proposed 
methods, numerical examples are presented with an implementation of asynchronous parallel QP 
using OpenMP. 

Key words. Distributed Optimization, Parallel Quadratic Programming, Asynchronous Algo¬ 
rithm, Dual Decomposition, Switched System 
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1. Introduction. Recent advancement of distributed and parallel computing 
technologies has brought massive processing capabilities in solving large-scale op¬ 
timization problems. Distributed and parallel computing may reduce computation 
time to find an optimal solution by leveraging the parallel processing in computa¬ 
tion. Particularly, distributed optimization will likely be considered as a key element 
for large-scale statistics and machine learning problems, currently represented by the 
word “big data”. One of the reasons for the preference of distributed optimization in 
big data is that the size of data set is so huge that each data set is desirably stored 
in a distributed manner. Thus, global objective is achieved in conjunction with local 
objective functions assigned to each distributed node, which requires communication 
between distributed nodes in order to attain an optimal solution. 

For several decades, there have been remarkable studies that have enabled to 
find an optimal solution in a decentralized fashion, for example, dual decomposi¬ 
tion [9], [2], [12], [19], [3], augmented Largrangian methods for constrained opti¬ 
mization [21], [29], [16], [1], alternating direction method of multipliers (ADMM) 
[20], [18], [17], Spingarn’s method, [30], Bregman iterative algorithms for £i prob¬ 
lems [6], [8], [11], Douglas-Rachford splitting [10], [27], and proximal methods [31]. 
More details about history of developments on the methods listed above can be found 
in the literature [5]. In this study, we mainly focus on the analysis of asynchronous 
distributed optimization problems. In particular, we aim to investigate the behavior 
of asynchrony in the Lagrangian dual decomposition method for distributed quadratic 
programming (QP) problems, where QP problems refer to the optimization problems 
with a quadratic objective function associated with linear constraints. This type of 
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QP problems has broad applications including least square with linear constraints, 
regression analysis and statistics, SVMs, lasso, portfolio optimization problems, etc. 
With an implementation of Lagrangian dual decomposition, the original QP problems 
that are separable can be solved in a distributed sense. For this dual decomposition 
technique, we will study how the asynchronous computing algorithms affect on the 
convergence as well as the rate of convergence for the dual variable. 

Typically, distributed optimization requires synchronization of the data set at 
each iteration step due to the interdependency of data. For massive parallelism, this 
synchronization may result in a large amount of waiting time as load imbalance be¬ 
tween distributed computing resources would take place at each iteration step. In 
this case, some nodes that have completed their tasks should wait for others to fin¬ 
ish assigned jobs, which causes idle process of computing resources, incurring waste 
of computation time. In this paper, we attack this restriction on synchronization 
penalty necessarily required in distributed and parallel computing, through the im¬ 
plementation of asynchronous computing algorithms. The asynchronous computing 
algorithms that do not suffer from synchronization latency thus have a potential to 
break through the paradigm of distributed and parallel optimization. Unfortunately, 
it is not completely revealed yet what is the effect of asynchrony on the convergence as 
well as the rate of that in the distributed optimization. Due to the stochastic behavior 
of asynchrony, the solution for the asynchronous distributed QP may diverge even if 
it is guaranteed that the synchronous scheme provides a convergence to an optimal 
solution. Although Bertsekas [4] introduced a sufficient condition for the convergence 
of general asynchronous fixed-point iterations (see chapter 6.2), which is equivalent 
to a diagonal dominance condition for QP problems, however, this condition is known 
to be very strong and thus conservative, according to the literature [28]. Therefore, 
the primal emphasis of this research is placed on: 1) convergence analysis; 2) analytic 
estimation on the rate of convergence, by employing a new framework for analysis of 
distributed QP problems with an asynchronous update of dual variable. 

For this purpose, we will adopt the switched system [15], [14], [13], [25], [22], [26], 
[24], [23] framework as an analysis tool. In general, the switched system is defined as a 
dynamical system that consists of a set of subsystem dynamics and a certain switching 
logic that governs a switching between subsystems. For asynchronous algorithms of 
which dynamics is modeled by the switched system, subsystem dynamics denotes all 
possible asynchronous computing due to the difference of data processing time in each 
distributed computing devices. Then, a certain switching logic can be implemented 
to stand for a random switching between subsystem dynamics. Thus, the switched 
system framework can be used to properly model the dynamics of asynchronous com¬ 
puting algorithms. Lee et al. [24], for example, introduced the switched system to rep¬ 
resent the behavior of asynchrony in massively parallel numerical algorithms. In this 
literature, the authors applied the switched dynamical system framework in order to 
analyze the convergence, rate of convergence, and error probability for asynchronous 
parallel numerical algorithms. Based on this switched system framework, this paper 
provides a new approach for convergence analysis of asynchronous distributed QP 
problems with dual decomposition technique. The proposed methods will guarantee 
the convergence to the optimal solution in the mean square sense. In addition, we will 
study how fast each scheme (e.g., synchronous and asynchronous scheme) converges 
to an optimal solution by studying the rate of convergence in analytic form. There¬ 
fore, this paper will present fundamental yet important analysis on the asynchronous 
distributed QP problems through the switched system framework, which facilitates 
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investigation on the stochastic behavior of asynchrony. 

Rest of this paper is organized as follows. In section 2, preliminaries are presented 
in connection with problem formulations for asynchronous distributed QP problems 
using dual decomposition. Section 3 introduces the switched system to model the 
asynchrony in the asynchronous distributed QP problems. The results for the conver¬ 
gence and the rate of convergence by employing the switched system framework are 
derived in section 4 and 5, respectively. The numerical example with a real implemen¬ 
tation of distributed and parallel QP is provided in section 6, to verify the validity of 
the proposed methods. Finally, section 7 concludes the paper. 

2. Preliminaries and Problem Formulation. Notation: The real number, 
positive integer, and the non-negative integer are denoted by the symbol R, N, and No, 
respectively. The symbol ^ represents the transpose operator. For any real matrix 
A,B€ the inequality A < B is interpreted by the quadratic sense, (i.e., 

Av < Bv for any real vector v G R"). In addition, the symbol 0 stands for the 
Kronecker product. 

2.1. Duality Problem. Consider the following QP problem with a linear in¬ 
equality constraint. 

(2.1) minimize f(x) 

(2.2) subject to Ax < b, 

where f{x) is given by a quadratic form, meaning /(x) = -x^Qx + c^x, the matrix 

Q G R"^" is a symmetric, positive definite and c G R” is a vector. Further, in the 
inequality constraint (2.2), it is such that A G R™^" and b G R'". If we define the 
Lagrangian as L{x,y) = /(x) -I- [Ax — b), where y G R"* is the dual variable or 
Lagrange multiplier, then the dual problem for above QP is formulated as follows. 


Duality using Lagrangian; 


(2.3) 

maximize infL(x,y) 

X 

(2.4) 

subject to 2 / > 0. 


The primal optimal point x* is obtained from a dual optimal point y* as 

X* = argmin L(x, y*). 


By implementing gradient ascent, one can solve the dual problem, provided that 
inf L(x,y) is differentiable. In this case, the iteration to find the x* is constructed as 
follows: 

(2.5) x^^^ := argminL(x,y^), 

X 

(2.6) y'^+^ —y^+ q^[Ax'^+^- b), 

where a* is a step size and the upper script denotes the discrete-time index for 
iteration. 
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For the quadratic objective function f{x), the value argminL(a;, can be alter- 

X 

natively obtained hy VxL{x,y^) =0, which leads to 

argmin L{x, y^) = Vx(-x^Qx + c^x + y'^^(Ax — b) 

X 

= Qx + c + AJ y^ = 0. 


From (2.5), we have 

(2.7) x'^+^ = -Q-\A^y’^ + c). 

Plugging (2.7) into (2.6) results in 

/+1 =y^ + a^ (A (-Q-1(AT/ + C)) - b) 

( 2 . 8 ) = (/ - a'^AQ-^A^)y'^ - a’^iAQ-^c + b). 

With the assumption that > 0 Vfc, the above equation provides the solution 
for y* and hence x*, if p{I — a^AQ~^A^) < 1 as follows: 

y* = (I- a'^AQ-^A^)y* - a'^(AQ-^c + b). 

( . \v* = -{AQ~'^A^)~'^(AQ~^c + b), (if AQ~^A^ is non-singular), 

^ ^ \x* = -Q-i(AV + c). 

2.2. Dual Decomposition with Synchronous update. In this subsection, 
we consider that /(x) = ^x^Qx -I- c^x is separable, which means 

N 



where x = [x^, xj,..., Xj^]^ and the variables Xi G M"’ ,i = l,2,...,iV are subvectors 
of X. Also, the matrix A in (2.2) satisfies Ax = X]i=i TliXi, where Ai is such that 
A = [Ai, A 2 , ■. ■, Ajv] ■ 

Then, the equations (2.5) and (2.6) are updated by 

(2.10) x,^+^ := argminL(xi,y'') = -Q~^{AJy^^ + c), 

Xi 

( 2 . 11 ) y^+^ ■=y‘^+ q^(Ax^+^- b). 


Note that when updating xj^^, i = 1, 2,..., A^, each value is computed by dis¬ 
tributed nodes. Hence, the computation for x\'^^ can be processed in parallel and 
then, each value of x^'^^ is transmitted to the master node to compute in the 
gathering stage. Therefore, as in (2.11), updating y^^^ requires synchronization of 
x^'^^ across all spatial index i at time k-\-l because x^'^^ is obtained by stacking 
from z = 1 to A^. In Fig. 1, we described the conceptual schematic of synchronous 
update for dual variable y. If computing delay occurs among one of the index i due 
to the difference of processing time in distributed node, the process to update 
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Synchronous 

Algorithm 



Asynchronous 

Algorithm 



Fig. 1. The schematic of update timing for the variable ; upper one shows the synchronous 
algorithm, where q is the length of maximum delay - i.e., all delays are bounded by q; bottom one 
shows asynchronous algorithm. The time to compute y^ is given by 1 CPU time. 


has to be paused until all data is received from distributed nodes. This implies that 
the more parallel computing we have, the more delays may take place, resulting in a 
large amount of the idle time. Consequently, this idle time for synchronization be¬ 
comes dominant compared to the pure computation time to solve the QP problem 
in parallel. In massive parallel computing algorithm, it has been reported that the 
synchronization latency may be up to 50% of total computation time according to 
the literature [7]. In order to mitigate or avoid this type of restriction that severely 
affects on the performance to obtain an optimal solution, we introduce asynchronous 
computing algorithm in the following subsection. 

2.3. Dual Decomposition with Asynchronous update. In order to allevi¬ 
ate this synchronization penalty, we consider asynchronous update of dual variable y. 
In this case, the master node to compute y^~^^ does not wait until all is gathered. 
Rather, it proceeds with the value for xt saved in the buffer memory. Thus, y value 
is updated asynchronously. To model the asynchronous dynamics of dual decomposi¬ 
tion, we consider the new state vectors as follows. 


• The state for the Asynchronous model: 


X := Xn 


'-N 


T 


where k* £ {k,k — 1,..., k — q + 1}, i = 1,2,..., N, denotes delay term that 
may take place due to the load imbalance in distributed nodes, and the term 
€ N represents the maximum possible delay. 


For this asynchronous case, y-update is given by 


( 2 . 12 ) 


yk+l 

= y'‘+J2(a^Ax,^+^ 
2=1 



where is the step size for the index i. 
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Distributed node: 


Master node: 



q : the maximum possible delay 
: the value of Xi at time k 

x\^ : the random variable such that x^' S x ’^~^,..., x'^~'^~^^} 

Hi := [(7ri)i, (7r2)i,..., (tt,)*], where ( 71 ^) 1 , j = stands for the modal 

k* 

probability for x^ * 






Fig. 2. The schematic of the stochatic asynchronous algorithm in the distributed quadratic 
programming. In this figure, the maximum delay is bounded by k — q 1 < k* < k, Wi. Each node 
has the probability to represent random delays. 


Although may vary at each time step, we let be a constant value, denoted 
by ai, for simplicity. Hence, it satisfies that a := which is a fixed value. 

There are two different ways to update dual variable y. Throughout the paper, 
we denote these two different cases as the deterministic asynchronous algorithm and 
the stochastic asynchronous algorithm, respectively, in order to clarify and differen¬ 
tiate them. The deterministic asynchronous algorithm stands for the case where the 
variable k* is considered as a constant value and is given hy k* := k — q+1, Vb Thus, 

it leads to := ,..., ]. In this case, it is assumed that 

the value which is a g-step prior value of x^, is always available to the master 

node. In other words, all delays are assumed to be bounded by the finite value q. 
Therefore, one can proceed with j/-update, given in (2.12), without synchronization 
when applying the deterministic asynchronous algorithm. Note that there is no ran¬ 
domness in the deterministic asynchronous algorithm. Although this deterministic 
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case obviates the unnecessary idle time by avoiding synchronization, it always utilizes 
g-step prior values saved in the buffer memory. In the real implementation of the 
distributed optimization, however, k* varies from distributed nodes and also changes 

k* 

over each iteration. Thus, we consider another case by letting x^' as a random vec¬ 
tor, where k* becomes one of the values in the given set {k, k — 1,..., k — q + 1}. To 
distinguish this case with the deterministic asynchronous algorithm, it is referred to 
as the stochastic asynchronous algorithm. 

Fig. 2 describes the conceptual schematic of the stochastic asynchronous algo¬ 
rithm using the dual decomposition in QP problem. Depending on the processing 
capability and load balance in distributed nodes, the value for x!^ is available or not 
in the master node at each iteration step. We assume that this delay is bounded by 
the finite value q. To describe the randomness of such delays, we adopt a probability 
Ili := [(7ri)i, {n 2 )i, ■ ■ ■, {T^q)i] G that predicts which value for x^ will be used to 

update as shown in Fig. 2. 

Starting from (2.12), with the definition of the set 5^ := {k*\k* = fc} and the 
symbol <i>i := —aiAiQ~^Aj, the state dynamics of the stochastic asynchronous algo¬ 
rithm is then given by 


N 


= y" 




= ^ aiAiX 


fc+1 


N 

aiAiX^ 


+ aiAiQ^ ^AJx^ + 


N 


E- 


N 


^^5fc+i / \i^S^ 


k-1 


(by (2.10)) 


(2.13) 


E I y 


N 


k-q+1 


^ ^ O^iAiQ^ C ) ' 


The above equation is simplified by the following definitions, given by 
(2.14) 


R,{k) := E 

j^^k-i+2 

/ N 


(2.15) 

resulting in 

(2.16) 


13 .— I ^ ( o^iAiQ^ c 


N 


y'^+i = (J - i?i(fc)) yk - R^{k)y>^-^ -i?,(A:)y'=-«+i + B, 


where the time-varying matrix Ri{k) completely depends on the value k* that is a 
random event. 

As described in [4], it is a very challenging task to analyze the stochastic asyn¬ 
chronous algorithm (see page 101, chapter 1). The primary goal of this paper is, 
therefore, to analyze not only the convergence but also the rate of that for the stochas¬ 
tic asynchronous algorithm which brings stochastic process for the state y*. For this 


purpose, we adopt a switched linear system (or jump linear system, interchangeably) 
framework that will be introduced in the next section in more detail. 

3. A Switched System Approach for Asynchronous Computing Algo¬ 
rithms. In order to solve the dual decomposition problem with random delays in dis¬ 
tributed nodes, we define a new augmented state := , y^~^ , ■ ■ ■, ]^. 

Then, one can define the following recursive dynamics: 


(3.1) 


yk+l - 


'I-Ri{k) 

-R2{k) 

-Rsik) ■■■ 

-Rq{k)' 


yk 


'B 

yk 


I 

0 


0 


yk-l 


0 

yk-l 

= 

0 

/ 

0 

0 


yk-2 

+ 

0 

yk-q+2 


0 

0 

I 

0 


yk-q+1 


_ 0 _ 


=F'‘+i =W{k) =Y>‘ =C 


where I and 0 are identity and zero matrices with proper dimensions, respectively. 
Consequently, the above recursive equation ends up with the following simple form: 

=> yfe+i = W{k)Y’^ Y C 


In fact, the structure of the time-varying matrix W{k) is not arbitrary, but it 
has a finite number of forms, given by , which counts all possible scenarios to 
distribute N numbers of f = 1, 2,..., A, matrices into the finite number of q. In 
the switched system, this number is referred to as the “switching mode number”, and 
we particularly denote this number with the symbol m. For instance, when q = 2 and 
N = 2, the switching mode number is given by m = 2^ = 4. Thus, at each time k, 
the matrix W{k) has one of the following form: 



I 

- $1 - $2 0 


I - $1 -$2 

Wi = 


I 0 

, W 2 = 

I 0 



/-$2 - 4>1 


I -$1 - $2 

W 3 = 


I 0 

, W 4 = 

I 0 


Then, only one out of all set of matrices will be used at each time k to 

update the system state Y^, which results in the switched linear system structure as 
follows. 

Consider the switched system: 

(3.2) = afee{l,2,...,m}, fceNo, 

where denotes the switching sequence that describes how the asynchrony 

takes place. Then, the switching probability n(fc) := ni(fc) (8) n 2 (fc) • • • 0 n 7 v(fc) = 
[7ri(fc), 7r2(fc), ... ,TTm{k)], where ni(fc) represents the probability for xj' as depicted 
by Fig. 2, determines which mode at will be utilized at each time step. (Note 
that ni(A:) and hence n(fc) are not necessarily to be stationary.) In this case, the 
switched linear system is named by “stochastic switched linear system” or “stochastic 
jump linear system” [26] because the switching is a stochastic process. The benefit 
when applying this stochastic switched linear system structure is that the delay in 
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the asynchronous algorithm is naturally taken into account by the switched system 
framework. Hence, the randomness of the asynchronous algorithm is represented by 
a certain switching logic. 

Remark 3.1. (Computational complexity due to an extremely large 
number of the switching modes) Although the stochastic switched linear system 
framework is suitable for modeling the dynamics of the stochastic asynchronous al¬ 
gorithm in distributed QP problems, it results in an extremely large number of the 
switching modes, causing computational complexity. For instance, even if q = 2 and 
N = 20, we have m = = 2"^^, and it is impractical to store such large numbers 

of matrices in the real implementation. Therefore, it is necessary to develop proper 
methods to analyze the stochastic asynchronous algorithm using the switched linear 
system without any concerns for such computational complexity issues. 

To avoid the computational complexity problems stated above, we firstly make 
following assumptions for analysis of both the convergence and the rate of the con¬ 
vergence for the stochastic asynchronous algorithm; 

• Assumption 3.1. We consider the random delays that occur during the 
computation of at each node. In this case, the probability ni(fc) = 
[(7ri(fc))i, (7r2 (fc))i,..., (7rq(fc))i] describes which value for x')' will be used 
among the given set {xf,x'f~^,...,x'f~'^~^^}. Then, we assume that each 
modal probability {Trj(k))i is stationary, and hence ni(fc) is also stationary 
in time. 

Under the Assumption 3.1., the switching probability n(fc) := Hi 002 ® • • • ^Hat 
becomes stationary. For this case, the jump linear system with the given dynamics in 
(3.2) is termed as the independent, identically distributed (i.i.d.) jump linear system. 
Since the modal switching probability iTr is a probability, it satisfies 0 < Tr^. < 1, Vr 
and TTr = 1- This stationary occupation probability rules which system matrix 

Wr will be used at each instance. The implementation of the switching sequence {(Xk}, 
governed by H, describes the randomness for the stochastic asynchronous algorithm 
in an average sense. 


4. Convergence Analysis. In this section, the convergence of the state 
for the stochastic asynchronous model will be studied under the switched system 
framework. For several decades, the stability results for the switched systems with 
stochastic jumping parameters have been well established, for example, in the liter¬ 
ature [25], [26], [15]. However, these methods are inapplicable to the asynchronous 
computing algorithm with massive parallelism because it results in extremely large 
numbers of switching modes, leading to computational complexity as explained in 
Remark 3.1. Therefore, we aim to investigate the convergence and the rate of conver¬ 
gence for the asynchronous algorithm without any concerns for such computational 
complexity issues. Particularly, this section will provide a convergence condition for 
the stochastic asynchronous algorithm in distributed QP problems. 

Before proceeding further to investigate the asynchronous model, we analyze the 
convergence of the synchronous case without delays for a reference. Since in the 
synchronous algorithm all values are synchronized after each iteration, no delays occur 
when updating the state dynamics. Then, the state for the synchronous case is 

governed by the following recursive equation: 
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- yk+l - 


'I-R 

0 

0 

... o' 


yk 


'B' 


yk 


I 

0 


0 


yk-1 


0 

(4.1) 

yk-1 

= 

0 

I 

0 

... 0 


1 

to 

-t- 

0 


yk—q+2 


0 

0 


0 ■ ■ 


yk-q+1 


_ 0 _ 


=Yj‘yti. =W.ync. =YJ^ 


where the matrix R := X]i=i Ri{^) = i® time-invariant, and hence the matrix 

Wsync. is also constant. Then, the steady-state value of := limfc-j-oo l^sync.) i® 

obtained by 

(4.2) i;*ync. = w^sync.i;;nc. + c-. 

4^sync. — {I ~ W^sync.) C, 

if the condition p(lTsync.) < 1 holds. 

However, the state in the i.i.d. switched linear system that represents the stochas¬ 
tic asynchronous model, evolves with the dynamics given in (3.2), where the matrix 
Wo-fc is determined by the switching probability H. Thus, the state of the asynchronous 
model becomes a random vector, obstructing the convergence analysis of the stochastic 
asynchronous model. For the stochastic switched systems, various convergence (sta¬ 
bility) notions have been developed [15], to guarantee the system stability. Among 
different convergence notions, we will focus on the mean square convergence, defined 
below. 

Definition 4.1. (Definition 1.1, [13]) The switched system is said to be mean 
square stable (convergent) if for any initial condition Xq and arbitrary initial proba¬ 
bility distribution n( 0 ), limfc^oo E[||a:(fc, ccq) —x*|p] = 0 , where x* is the fixed-point 
value of x^, i.e. lim x^ = x*. 

k^oo 

The necessary and sufficient condition for the mean square convergence of the 
i.i.d. jump linear systems is described as follows: 

Proposition 4.2. (Corollary 2.7, [14]) Consider an i.i.d. jump linear system, 
where n(fc) is a stationary probability vector { 711 , 712 ,-•• ,TTm} for all k. Then, the 
i.i.d. jump linear system is mean square stable (convergent) if and only if the matrix 
'^j ® Schur stable, i.e. 

7rj{W,<S>W,)] < 1 . 


Once again, massive parallelism results in large m, causing computational in¬ 
tractability. Thus, implementation of Proposition 4.2 is unfeasible to analysis of 
asynchronous distributed and parallel QP problems with massively parallel comput¬ 
ing algorithm because the equation in (4.3) requires the summation over index i from 
1 up to m. In order to avoid this problem, we provide Algorithm 1. 

By executing Algorithm 1 at every time step in the master node, the random 
vector x^ has the following form: = [(xi)^, (x^)^,..., (x^)^]^, where ^ denotes 

the oldest time among the recently updated values across the index i = 1,2 ,..., N. 


( m 

E 


(if (/ - W,ync.) is non-singular) 
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Algorithm 1 

1 

k* G- one of the values 

in {k, k — 1 ,... ,k — q + 1 } with probability 11 ^. 

2 

f^k 


3 

for i < N do 


4 

if ^ < k* then 


5 

^ fc*. 


6 

i ■<— T + 1 . 


7 

end if 


8 

end for 


9 

G- [(x^)^,(x^)^,... 



For example, if /c* = k — 2 for some i is the oldest value over all k*, i = 1, 2,..., TV, 
then we have (x 2 ~^)^, ■ ■ ■, In this case, the modal matrix 

Wr has the same structure with W{k), given in (3.1), where Ri{k) satisfies 


R,{k) = r’ = + 

| 0 . (otherwise) 

The utilization of Algorithms 1 then drastically reduces the switching mode number by 
q regardless of the value N, due to the fact that at each iteration step we intentionally 
use the oldest updated value saved in buffer memory. For example, when 9 = 2, the 
matrix becomes one of the following form: 


Wi 


I-R 0 
I 0 ’ 


W2 = 


I 

I 


-R 

0 


Since Algorithm 1 works as if it aggregates some subsets of the given switching 
modes, we need to redefine the switching probability 11 accordingly. Then, 11 is 
obtained by the following Theorem. 

Theorem 4.3. Consider the i.i.d. switched linear system given in (3.2) with the 
switching probability n = ni( 8 )n 2 ( 8 >...®njv€R^^'^ • After the implementation of 
Algorithm 1, the switching probability is redefined by 11 := [7ri,7r2,... ,7rg] € of 

which modal probability tt^ has the following form: 

N / r \ /r-l \ 

(4.4) ’ ^ = 1-2, •■•,9, 

i=l \j=l ) \j=l ) 

where the term {TTj)i denotes modal probability for Hi {i.e., 11 ^ = [( 7 ri)i, ( 7 r 2 )i,..., 

Mi] )• 

Proof. For simplicity of the proof, we assume that N = 2. The most general case 
is then proved similarly by induction. In this case, the master node takes the values 

k* . 

for each Xj* according to the probability 11^, i = I, 2, which are given by 

ni = [(7ri)i,(7r2)i,...,(7rg)i], 
n 2 = [(7ri)2,(7r2)2,...,(7rg)2]. 

We let the index j G {k,k — l,...,fc — 9 + I} be the value explained in Algorithm 1 . 










12 


When j = 1, the modal switching probability tti is obtained by 

TTi = Pr(^kl = k, k^ = fc) 

= Pr^/c* = X Pr^/c 2 = (since A:* and k^ are independent) 

= (7ri)i X (7ri)2. 


Similarly, when j = 2, we have 

7r2 = Pr^fc* S {fc, fc — 1}, ^2 € {k,k — 1}^ — tti 
2 2 

= ^ Pr(|/c* = fc - j + l) X ^ Pr(^k; = fc - j + l) - TTi 

i=i i=i 

^( 7 ^ 1)2 j -TTi. 

In the first line of above equation, we have to extract tti because it corresponds to 
the case when j = 1. 

For any arbitrary value j satisfying j G {fc, k — 1,..., k — q + 1}, the switching 
probability is therefore obtained by induction as follows: 

r — 1 

TTr = Pr ^A:* G {A:, A; — 1,..., A; — r + 1}, k^ G {A:, A: — 1,..., A: — r + 1}^ — tTj 

( r \ 

i=i / Vi=i / 

Thus, the most general case with q,N gN can be induced as follows: 

N / r \ /r-l \ 

= n ^ = r 2 ,...,g. 

/ Vj=i / 

□ 

For comparison, the switching mode number without the proposed algorithm is 
given by m = of which growth is exponential with respect to N, whereas with 
the proposed Algorithm 1, it is given by m = q that is a constant value irrespec¬ 
tive of N. Thus, by leveraging the proposed algorithm, one can apply the mean 
square convergence condition given in Proposition 4.2, to test the stability of the 
stochastic asynchronous algorithm. Note that the implementation of Proposition 4.2 
was computationally intractable without Algorithm 1 due to the large numbers in 
m. Consequently, the proposed algorithm enables the convergence analysis of the 
stochastic asynchronous parallel computing algorithm in QP problems. 

Once the condition (4.3) is guaranteed with a given i.i.d. switching probability If 
by implementing Algorithm 1, the steady-state (fixed-point) value Y* := limfe_>oo 
where is the state for the stochastic asynchronous algorithm of which dynamics is 
given in (3.2), can be obtained according to Definition 4.1 and is given by 

(4.5) Y* = Wa^Y* + C. 

^Y* = {I-W,,)-^C. 
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Interestingly, Y* becomes a unique vector, regardless of that changes over time, 
due to the inherent structure in matrices W^f, and C, which results in Y* = Y*^^^ , 
where is defined in (4.2). Therefore, the state for the stochastic asynchronous 

algorithm, denoted by Y^, converges to the unique, identical fixed-point value Y*, if 
the condition (4.3) holds. 

5. Rate of Convergence Analysis. Since the rate of convergence provides in¬ 
formation regarding how fast each scheme converges to the fixed-point value, it works 
as a guideline that suggests which methods will solve the given QP problem faster 
than other schemes. Therefore, the comparison for the rate of convergence between 
different schemes is advantageous in terms of estimating the time to obtain an optimal 
solution for the QP problem. Although asynchronous algorithms are considered to 
be more time-efficient for obtaining an optimal solution, it is not analytically proved 
yet what is the rate of convergence. Therefore, in this section we investigate the 
rate of convergence for three different algorithms (e.g., synchronous, deterministic 
asynchronous, and stochastic asynchronous algorithms) in analytic form. 

i) Synchronous algorithm with delays: 

For synchronous scheme, Y^ is updated after a certain amount of time due to 
the idle time for synchronization. As described in Fig. 1, we assume that all data 
from distributed nodes arrive at the master node within a bounded time q. In this 
case, idle process time for the synchronization is given by q and Y^ can be updated 
at every <(q-|-1) time step, where t G Nq. Consequently, at each time step, F^-update 
is given by 

at time t = 1: y(9+i) = bFsync.F® -|- C 

at time t = 2: y 2 (g-n) ^ y(9+i) + C 

at time t = 3: y3(g-n) ^ tPsync.F^^^+i) -p C 

at arbitrary time t + 1: y(*+i)(<?+i) = VFy„c.F*^'^+^^ -f C, t G Nq 

Now, we consider the term ||F^ — F*||oo in order to investigate the rate of con¬ 
vergence for the synchronous algorithm. Then, from the dynamics for synchronous 
case, given by Y^ = WsyncX^~^ + C, we have 

11- F* I U = 11IPsync .1 + C - F* I loo 

= WWsync.Y’^-^ - lFsync.F*||oo (by (4.2)) 

= IlfFsync. (lFsync.F'=-2 + C) - IFsync. F* 11 oo 

= ||(W'sync.)"F''-2 + lFsync.(C - F*)||oo 

= ||(TFsync.)MF'=-2-F*)||oo (by (4.2)) 

= ll(M4ync.)"(^°-^")lloo 

< ||(H4y„c.)1|oo-||F°-F*||oo, 

where k = t{q + l),t € Nq. Thus, we have the upper bound of the rate of convergence 
for the synchronous algorithm as follows: 

(5.1) IIF'^ - F^lloo < ||(lFsy„c.)"|U • - i"* 


OO 5 


k = t{q + 1), t G No. 
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ii) Deterministic Asynchronous algorithm: 

As described in section 2.3., the deterministic asynchronous algorithm takes ad¬ 
vantage of the q step prior value instead of waiting for all Xi values being gathered 
in the master node for synchronization. In this case, the system dynamics for the 
deterministic asynchronous scheme is given by 


= W'det.async.r'^ + C, 


where the matrix bidet.async. is defined as 


Ibdet 


.async. 


7 0 0 

7 0 0 

0 7 0 


-R 

0 

0 


0 0 7 0 


because in this case we have V* G 5^ in (2.13) for the deterministic asynchronous 
algorithm, leading to above system dynamics. 


Similarly to the process in obtaining (5.1), the upper bound of the rate of con¬ 
vergence for the deterministic asynchronous algorithm is derived by 


(5.2) ||r'= - r*||oo = IKW^det.async.)" (V^ - V*) ||oo 

< ||(Wdet.async.)"||oo ' 11^"° " k € Nq. 


iii) Stochastic Asynchronous algorithm: 

Since the state becomes a random vector in the stochastic asynchronous case, 
the rate of convergence for — I^*||oo forms a distribution rather than a determinis¬ 

tic value, and is difficult to analyze such a distribution. Thus, we take the expectation 
for with respect to the i.i.d. switching probability II, and investigate the rate of 
convergence for ||E[y''] — F*||oo- 

Under the assumption that the mean square convergence condition in Proposition 
4.2 holds, the fixed-point value for is deterministically given by V*. Therefore, 
it satisfies E[y*] = V*. Taking the expectation in (4.5) results in E[y*] = Y* = 
+ C] = E[W^^]Y* + C = Pr (ELi + C*- % defining a new 

matrix A := X]r=i we end up with 


(5.3) 


E[Y*] =Y* = AY* + C. 


Then, the term ||E[T^] — U*||oo becomes 
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||E[y'=] - r*||oo = \\E[w^,_,Y^-^ + c]- y*||oo 

9 

= ||^Pr(iy^,_,y^-^+C|afc-i =r) Pr K-i = r) -F* 


r—1 

q 


= ll^^.lF.Pr (r'=-i|afe_i =r) + C-F*||oo 

r—1 

q 

= ||^^,lF,Pr(y'=-i|afe_i =r) -AF*|U (by (5.3)) 

r—1 

= II ( ^ Pr + C\ak -2 = s) n, - AF*||oo 


=A 

q 


= ||A ^7r«lF,Pr(F'=-2|afc_2 =s) +C - AF* 


^S=l 

^ q 


= ||A (^^7r«lF,Pr(F'=-2|afc_2 =s)j + AC - A (AF* + C) lU 

(by (5.3)) 

= ||A (^^,lF,Pr(F'=-2|afc_2 =s) ) -(A)"f*||oo 


\S^1 


= ||(A)'=-i l^^TTtWtPr {Y°\ao = t) + Cj -{A^Y* 

= II (A)'=-' [j^^twA Fo - (A)'=F*|U 


\t^l 

-A 

= ||(A)'= (r°-F*)|U 
< ||(A)'=|U-||F°-F*|U, 

where we used the law of total probability in above equations. 

Therefore, the rate of convergence for the asynchronous scheme is given by: 


(5.4) 


||E[F'=]-F*|U < ||(A)'=|U-||F°-F* 


where with implementation of Algorithm 1 the matrix A := X]r=i '^r^r has the 
following form: 


(5.5) A = 


I — ttiR —7r2i? —ttsR 
10 0 
0/0 


-TTqR 

0 

0 


N N 

, R-.= '£Mk) = 'E^j- 


2=1 
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Fig. 3. Convergence results for distributed quadratic programming with stochastic asynchronous 
algorithm. The (green) solid lines represent the state trajectory for y with total 100 Monte Carlo 
simulations (initial value was deterministically given by y{0) = 2 for all cases). The (red) solid-cross 
line denotes the mean and the standard deviation of multiple trajectories, respectively. 


6. Numerical Example. In this section, we test the proposed asynchronous 
algorithms on distributed QP problems with dual decomposition technique. The sys¬ 
tem for the test bed is given by Intel (R) Core(TM) i7-4710HQ CPU, which has 4 
cores with 8 threads (by Hyper-Threading Technology), with 8GM memory. Although, 
the number of threads for this test bed is not very large, the system is enough to show 
the performance of proposed asynchronous computing algorithms for distributed QP 
with dual decomposition. We implemented parallel processing through OpenMP API 
(Application Program Interface) developed for direct multi-threaded, shared memory 
parallelism. 

Let us consider the following distributed QP problem; 



subject to AiXi < bi, i = l,2,...,N. 

The positive definite matrices Qi, the matrices A^, and the vectors Ci and bi 
were generated by implementing pseudo random number generator in C++. The 
dimension of matrices and vectors are set to be: Qi G G Ci G 

and bi G R, i = 1,2,N, where n = 10, N = 20000. Thus, computational burden 
for solving each distributed QP is low, whereas the total number of distributed QP is 
extremely high. We let the buffer length g = 8 and the step size ai = 0.27, Vi. 

For this type of massively distributed QP problem, the time for synchronization 
may become dominant in the total amount of time to solve QP. In this case, asyn¬ 
chronous computing algorithms may lead to speedup by avoiding synchronization. 
We solved above distributed QP problem with the implementation of the proposed 
stochastic asynchronous algorithm. In Fig. 3, total 100 times of state trajectories 
for the dual variable y are given by (green) solid lines. Since y-update is stochas¬ 
tic process in the asynchronous algorithm, the trajectories are different from each 
other, resulting in the spread of the trajectories in the transient time. The i.i.d. 
switching probability H^ that describes asynchronous computing for each distributed 
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0 20 40 60 80 100 120 140 160 180 200 

iterations (k) 


Fig. 4. The rate of convergence results for distributed quadratic programming with three differ¬ 
ent schemes: synchronous (cross symbol); deterministic asynchronous (green dotted line); stochastic 
asynchronous (red solid line) algorithms 


node is given by {TTj)i = J = Vb Then, by Theorem 4.3 the 

switching probability for the switched system in (3.2), denoted by 11, is computed as 
n = [O, 0, 0.08, 0.8, 0.11, 0.01, 0, O]. For this i.i.d. switching probability, we calcu¬ 
lated the spectral radius given in (4.3), which is p ® Fj)) = 0.6147 < 1. 

Therefore, the convergence of the stochastic asynchronous algorithm is guaranteed in 
the mean square sense. The result in Fig. 3 also verifies the mean square convergence. 
The empirical mean and standard deviations are denoted by (red) solid line with cross 
mark and vertical bars, respectively. As the iteration step increases, the error of the 
mean square converges to zero (zero standard deviation). 

Next, we predict the rate of convergence for three different schemes: i) syn¬ 
chronous case; ii) deterministic asynchronous case; iii) stochastic asynchronous case, 
in order to compare the performance. By employing the proposed results in section 
5, we plotted the rate of convergence in Fig. 4. According to this result for the upper 
bound of the rate of convergence, the stochastic asynchronous algorithm is advanta¬ 
geous to speedup the total computation time in finding the optimal solution. This 
stochastic asynchronous scheme is up to 5 times faster than the synchronous algorithm 
and 2.5 times faster than the deterministic asynchronous algorithm, respectively. 

In Fig. 5, we plotted actual computation time to find the optimal solution for 
three different schemes. For comparison purpose, the computation time for the se¬ 
quential case is also given as a reference. The termination for the iteration is given by 
the residual tolerance \y'^ — < 10“®. As shown in Fig. 5, the proposed stochastic 

algorithm achieves the fastest convergence to solve the distributed QP problem. This 
result coincides with the result on the rate of convergence, which provides informa¬ 
tion regarding which schemes are the best to solve the given QP problem even before 
solving the optimization problem. 

For three different schemes. Table 1 and Fig. 6 present the computation time and 
speedup, respectively as we increase the number of threads in the test bed. Also, we 
plotted speedup of three different schemes based on Table 1, by increasing the total 
number of threads. As the number of threads increases, the performance degradation 
occurred in the synchronous case, whereas the deterministic and stochastic asyn¬ 
chronous algorithms resulted in continuous speedup. When the number of threads is 
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Fig. 5. The convergence time comparison between sequential computing and three different 
schemes when the number of threads is given by 8 (maximum possible parallelization for the test 
bed. 


Table 1 

Comparison of total computation time for the dual variable being convergent to the optimal value. 


No. of 

Synchronous 

Det-Asynchronous 

Sto-Asynchronous 

Threads 

Time 

Speedup 

Time 

Speedup 

Time 

Speedup 

#2 

5.2012s 

1.89 

7.8774s 

1.25 

4.4422s 

2.22 

#3 

4.0189s 

2.45 

5.8558s 

1.68 

3.1259s 

3.15 

#4 

3.3848s 

2.91 

4.8792s 

2.02 

2.6342s 

3.74 

#5 

3.3511s 

2.94 

4.3913s 

2.24 

2.3071s 

4.27 

#6 

3.3547s 

2.94 

3.8129s 

2.58 

2.0249s 

4.86 

#7 

3.5891s 

2.74 

3.4590s 

2.85 

1.8351s 

5.37 

#8 

3.8340s 

2.57 

3.3260s 

2.96 

1.6933s 

5.81 


8, the stochastic asynchronous algorithm led to 5.81 times speedup compared to the 
sequential computing, which is also 2.26 times faster than synchronous algorithms. 

As described in Remark 3.1, the computational complexity was the major concern 
when adopting the switched system framework for analysis of the stochastic asyn¬ 
chronous algorithm. To circumvent this complexity issue, we applied Algorithm 1. 
Thus, the number of switching modes has been drastically reduced from 
to q = 8, owing to Algorithm 1. Consequently, the analysis of stochastic asynchronous 
computing algorithm was carried out in a computationally efficient manner. 

7. Conclusion. In this paper, we studied the convergence of asynchronous dis¬ 
tributed QP problems via dual decomposition technique. To analyze the behavior 
of asynchrony in distributed and parallel computing, the switched system framework 
was introduced. Since the switching mode number becomes large for massively asyn¬ 
chronous computing algorithm, we developed a new algorithm, which drastically de¬ 
creases mode numbers. By implementing the proposed method, the convergence con¬ 
dition in the mean square sense can be checked without any computational complex¬ 
ity issues. Also, we derived the rate of convergence for three different schemes (e.g., 
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Fig. 6. The speedup vs. numbers of threads 


synchronous, deterministic asynchronous, and stochastic asynchronous algorithms), 
which analytically shows how fast dual variable converges to the optimal solution. 
The numerical example with an implementation of asynchronous distributed QP us¬ 
ing OpenMP supports the validity of the proposed methods. 
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